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ABSTRACT 


■{ 


We  present  a method ^for  fitting  radioimmunoassay  calibration  curves  which 
are  used  for  measuring  the  concentration  of  various  antigens  in  vitro.  The  curves 
to  be  fitted  are  modified  hyperbolae  on  the  basis  of  only  a few  observations  (typi- 
cally 12  to  16).  Previous  methods  of  fitting  Involved  either  linearizing  the 
curve  and  estimating  by  least  squares  or  fitting  directly  by  nonlinear  least 
squares.  Unfortunately,  the  linearization  techniques  used  are  not  usually  success- 
ful in  their  intent  and,  furthermore,  outliers  are  quite  common  due  to  the  large 


ntnber  of  sources  of  error. 
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We  present  ^he  algorithm  we  devised  and  used  successfully  for  finding 
M-estlmates  (introduced  by  P,  J.  Huber)  of  the  radioimmunoassay  curves,  and  demon- 
strate its  superiority  to  the  method  of  least  squares  in  the  presence  of  outliers 
and  also  its  similarity  to  least  squares  in  the  absence  of  outliers. 


(. 


DISTRIBUTION  STATEMENT  A 


Approved  tot  public  reloane; 
Distribution  Unlimited 


/ 


D D C 

OCT  22  1976 

JifElSiP.T'iJF 

D 


Research  partially  supported  by  a grant  from  the  Office  of  Naval  Research. 


1 


i 


1.  INTRODUCTION 

The  calibration  experiment  can  be  abstracted  as  follows,  [5]  : the  variable 

of  Interest,  X , Is  difficult  or  impossible  to  measure  directly,  while  a related 
variable,  Y , which  Is  dependent  on  X , is  fairly  easy  to  measure.  A measurement 
Is  made  on  the  variable  Y and  an  estimate  Is  made  of  the  associated  X . Opera- 
tionally, the  relationship  between  X and  Y must  be  quantified.  For  a given 
value  of  X , Y Is  distributed  about  a mean  function,  f(X,g)  , with  a certain 
dispersion  a . To  determine  this  functional  relationship,  n couplets 
(X^,Yj^) , . . . , (X^,Y^)  are  observed  at  known  values  of  X , where  it  is  assumed 
that  the  measurement  errors  associated  with  X are  negligible  relative  to  the 
errors  In  Y . If  the  form  of  the  function  f Is  known,  then  these  observations 
are  used  to  estimate  the  parameters  g and  a . Subsequently,  Independent  obser- 
vations Yj,  j*n  + l,...,n+m,  are  made  at  m unknown  values  of  X , and  the 
estimated  calibration  curve  f is  employed  to  make  Inference  on  the  corresponding 
unknown  values  X^  , j •»  n + 1 n + m . 

The  calibration  problem  occurs  in  radioimmunoassay,  which  is  a biological 
technique  devised  to  measure  minute  concentrations  of  various  biological  substances 
which  appear  In  man  [9]  . The  variable  X refers  to  the  concentration  of  the  sub- 
stance  of  Interest  ; a concentration  sometimes  as  small  as  a plcogram  per  mlllilltc 
thus  making  direct  measurement  virtually  impossible.  The  variable  Y refers  to 
radioactive  counts  of  bound  residue  of  the  assay  [6]  . A typical  clinical  radio- 
immunoassay is  shown  in  Table  1 . He^^e  n • 14  (it  is  usually  12,  14  or  16). 

In  this  paper  we  only  concentrate  on  obtaining  the  calibration  curve  so  we  do  not 
show  Yj  , J ••  n + l,...,n  m ; typically,  m can  be  anywhere  between  about  50 
and  300  . 

He  thank  Dr.  Jehuda  Steinbach  of  the  Nuclear  Medicine  Department,  Veteran's 

Hospital,  Buffalo,  for  making  the  data  available  to  us. 
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Y 

X 

counts 

concentration 

1 

7720 

0.0 

2 

8113 

0.0 

3 

6664 

2.0 

4 

6804 

2.0 

5 

4994 

5.0 

6 

4948 

5.0 

7 

3410 

10.0 

8 

3208 

10.0 

9 

4478 

20.0 

10 

2396 

20.0 

11 

1302 

50.0 

12 

1377 

50.0 

13 

1025 

100.0 

14 

1096 

100.0 

Table  1 

One  of  the  most  common  methods  for  finding  the  form  of  the  dependence  of 
Y on  X with  radioimmunoassay  data  Is  the  linearization  method  [4]  . Suppose 
that  the  count  at  X ■ 0 is  > (usually  estimated  by  taking  the  average  of  the 
two  observations  at  zero  concentration)  and  N is  the  background  noise  (usually 
estimated  by  placing  two  "empty"  tubes  into  the  counter  and  taking  the  average  of 
the  resultant  counts).  Then  define  a new  response  variable  by  ■ (Y  - N)/Yq  . 
After  making  a number  of  assunptions  about  the  chemical  reactions  Rodbard  and 
Lewald  [4]  show  that 

logit  Y'  - log  |y7(1  - Y')l  - a'+p'logX  + € , (1) 

where  the  errors,  € , have  zero  mean  and  are  heteroscedastic.  An  example  of  this 
fit  is  given  in  Figure  1. 

This  method  is  not  recoiiinended.  Some  experiments  have  too  small  Y^  or  too 
large  N (since  they  are  measured  quantities,  they  are  also  subject  to  error)  thus 
leading  to  Y'  which  are  not  in  the  Interval  (0,1)  . But  more  disturbing,  is, 
that  frequently,  as  in  Figure  2,  the  data  are  not  approximately  colincar  so  that 
the  fitted  straight  line  is  not  representative  of  the  data. 


2. 


THE  HYPERBOLIC  CURVE 


Since  the  linearization  method  does  not  produce  acceptable  fits,  we  chose 
to  inspect  the  scatter  plots  of  Y versus  X for  a large  number  of  data  sets 
(124  in  all,  including  the  substances,  Insulin,  Renin,  TSH,  Digoxin,  Folic  Acid, 
Vitamin  B-12,  T3  and  Gastrin).  The  data  suggests  a modified  hyperbola, 

Y - a + ® — r + € , (2) 

1 + YX° 

where  the  € have  mean  zero  and  are  homoscedastic.  Figures  3 and  4 show  both 
curves  (1)  and  (2)  fitted  by  least  squares;  the  dotted  line  is  curve  (1). 

The  superiority  of  the  modified  hyperbola  is  evident  even  though  both 
curves  usually  require  four  parameters;  in  curve  (1)  one  must  include  both  zero 
concentration  and  background  counts  as  parameters  to  be  estimated.  Incidentally, 
for  Vitamin  B-12  and  Gastrin  one  may  reduce  the  number  of  parameters  in  (2)  by 
setting  6*1  and  thus  obtaining  a rectangular  hyperbola.  If  one  assunes  zero 
error,  the  two  curves,  (1)  and  (2),  are  mathematically  identical  (N  = a , 

Xq  ■ P , “ "log  Y , p'  ■ -6)  . Thus  the  difference  between  the  two  methods  is 

the  different  error  structures  assumed  and  the  flexibility  afforded  by  curve  (2) 
for  estimating  zero  concentration  counts  and  background  noise. 

Having  decided  that  curve  (2)  is  superior  to  curve  (1),  the  question  then 
arises  as  to  a reasonable  algorithm  for  estimating  the  curve  parameters.  Usually 
an  ordinary  least  squares  fit  of  the  modified  hyperbola  yields  a reasonable  cali- 
bration curve.  Testing  has  shown  that  an  assumption  of  homoscedasticlty  for  the 
errors  is  valid,  [7j  . Unfortunately  though,  data  sets,  such  as  displayed  in 
Figures  5 and  6 , too  often  occur  for  which  one  or  two  points  are  obviously  in 
gross  error  (see  also,  points  2 and  9 in  Table  1).  The  dotted  line  in  both 
figures  represents  the  least  squares  estimation  of  curve  (2).  The  fit  is  clearly 
not  acceptable;  it  is  overly  influenced  by  outlying  observations.  These  last 
mentioned  observations  could  be  discarded  and  the  curve  refitted  by  least  squares 
but  the  statistical  properties  of  the  resulting  curve  are  difficult  to  evaluate. 
Furthermore,  wishing  to  find  an  automatic  procedure  we  prefer  to  use  the  followln 
methodology. 
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3.  M-ESTIMATES 

A way  of  curtailing  the  Influence  of  observations  which  are  overly  large  or 
small  is  to  turn  to  robust  estimation  methods.  Andrews  [1]  , [2]  reports  success 
with  the  use  of  M-estimates,  [3]  , in  linear  regression.  To  extend  the  method  to 
nonlinear  regression  (see  also  [8]  ),  consider  the  least  squares  estimates  of  g 
in  model  (2),  on  the  basis  of  observations  , . . . , (Y^,X^)  . They  satisfy. 


0,  1"1,...,4 


(3) 


where,  with  cp(z)  « z , 


f(X,g)  - ft  + 2__  ^ B - (a  , M , V , 6) 

1 + YX 


and  f ' is  the  partial  derivative  of  f with  respect  to  the  i^'^  component  of  g . 
Replacing  the  above  ip  by 


cp(2) 


!sin  (z/c  s ) 
0 


|z|  S TTC  8 
I z|  > TT  C 8 


where  s is  a scale  estimate,  we  obtain  the  sine-estimate,  [1]  , a special  case 
of  the  M-estlmates.  We  use  c > 2.1  and  as  an  estimate  of  scale 


8 ■ median  [largest  (n  - p + 1)1 Y^  - f(Xj,2)l}  , 


with  p the  number  of  parameters. 

To  solve  equations  (3)  for  the  sine  estimate  since  the  equations  are  non- 
linear we  must  rely  on  an  iterative  algorithm.  The  iteratively  rewelghted  least 
squares  algorithm  we  use  to  solve  equations  (3)  is  similar  to  the  Gauss'Newton 
algorithm.  Denoting  the  iterate  by  then  is  th®  solution  to 

the  following  normal  equations; 
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where 


j=l' 


p(k+l)(i) 


0 


and) 


J “ l,...,n 
i “ 1, ...  ,4 


/s 


(k) 


» 


(k)  . (k).,  (k) 

wj  ■ 0/rj 


(k)  (k) 
® j 


Ji 


Without  modification,  this  algorithm  failed  to  converge  ( |P^^\i)  - | 

< e |B^*'^(1)|  , 1 ■ 1 4)  In  9 of  the  124  data  sets.  On  inspection  it  was 

found  that  these  data  sets  contained  too  many  gross  Inaccuracies.  The  calculations 
took  about  507.  more  time  than  the  ordinary  GaussjB.Newton  method  (part  of  this  time 
Is  undoubtedly  due  to  the  calculation  of  the  sine  function;  simpler  weight  functions 
are  available).  To  obtain  starting  values  we  used  the  following  facts: 

(1)  - a If  5^0,  (11)  f(0,g)  - a+  p If  6^0,  (ill)  f'  (0,£)  - -P 

If  & ■ 1 , (Iv)  use  6 ■ 1 as  a starting  value,  since,  usually  0.5  ^ 6 ^ 2.5  . 


Figures  5 , 6 and  7 show  examples  of  the  resultant  fitted  curve  (solid 
line).  Figure  7 , which  does  not  have  any  outliers,  also  displays  the  least  squares 
line  which  Is  Indistinguishable  from  the  robust  line.  These  are  examples  of  the 
general  result  we  found.  If  there  are  outliers,  the  robust  curve  Is  not  usually 
Influenced  by  these  values.  If  there  are  no  outliers  then  the  robust  curve  Is  simi- 
lar to  least  squares. 
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